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SUMMARY  AND  RECOMMENDATIONS 


Ratings  on  the  Interview  Evaluation  Form  used  with  applicants  for 
CNA  support  jobs  are  highly  consistent  within  sets  of  interviewers,  but 
unreliable  between  sets  of  interviewers.  Their  validity  for  predicting 
later  job  performance  is  modest  at  best. 

A  more  structured  interview  evaluation  form  should  be  sought,  and 
sevenal  interviewers  trained  to  use  it.  This  will  insure  more  stan¬ 
dardized  treatment  of  applicants  and  useful  data  for  future  reliability 
and  validity  evaluation. 


PURPOSE 
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The  Interview  Evaluation  Form  shown  on  the  next  page  has  been 
used  with  applicants  for  CNA  support  positions.  It  contains  10  items 
on  specific  background  characteristics  and  interview  behaviors  that 
are  rated  on  a  scale  ranging  from  superior  through  unacceptable.  Another 
item  is  used  to  evaluate  the  applicant  on  the  degree  to  vhich  he  or 
she  meets  the  requirements  of  the  position,  and  a  final  item  is  the 
interviewer's  recommendation  to  hire,  hold,  or  reject  the  applicant. 

Since  an  interviewer  can  influence  an  applicant's  behavior,  it 
is  important  to  know  if  the  interviewer's  behavior  differs  from  applicant 
to  applicant  and  if  interviewers  differ  from  each  other.  Therefore, 
we  will  try  to  evaluate  both  interviewer  consistency  and  the  agreement 
between  interviewers,  or  reliability.  Finally  we  will  evaluate  the 
relationship  between  interviewer  ratings  and  later  performance  ratings, 
or  validity. 

In  general,  the  literature  shows  that  both  the  reliability  and 
validity  of  interview  data  vary  widely  (Blum  and  Naylor,  1968,  pp.  153- 
154).  Thus,  it  is  important  that  we  evaluate  the  important  features 
of  consistency,  reliability,  and  validity  with  the  data  at  hand  and  to 
the  extent  possible. 


INTERVIEW  EVALUATION  FORM 


Applicant's  Hates 

Interviewer 

Tin  applicant's 

).  personal  Impression. 

2.  self-confidence. 

3.  common  sense. 

4.  adaptability  and  flexibility. 

5.  ability  to  communicate. 

6.  answers  to  questions. 

7.  work  experience  and/or  educational 
training  relative  to  the  position. 

3.  reasons  for  past  changes  in 
employment. 

9.  enthusiasm  and  interest. 

10.  skills  that  are  required  by  the 
particular  position. 


Position 


Date 


3 

Superior 

X 

Accept 

1 

able  Unacceptable 

-  ....  -  -  - . —  ■  ■  ■  — 

) 

L  - 

T  | 

f  -H 

General  Comments  (include  specific  strengths  and  weaknesses): 


2, 


Exceeds 
Requi remen t s 


Meets  Minimum 
Requi cements 


Does  Hot  Meet 
Minimum  Requirements 


Recommendations: 

[  j  Make  offer  subject  to 
reference  checks. 


f  1  |  Reject.  [  SL  |  Mold  for  future 

reference. 
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applicants  and  data 

There  were  37  applicants  interviewed  from  January  1970  through 
March  1971  for  whan  interview  and  performance  evaluation  data  were 
available.  The  variability  of  the  ratings  is  less  than  would  be 
expected,  since  rejected  applicants  were  r.ot  included. 

The  interview  ratings  were  coded  3,  2,  and  1  as  shown  on  the 
form.  Although  this  restricts  somewhat  the  variation  in  the  original 
ratings,  it  clearly  demarcates  the  lim-'ts  or  areas  of  the  marks  along 
the  scales . 

Below  is  a  table  of  che  available  ratings  and  other  data: 


1.  Interview  Evaluation  Form: 
Number  of  interviewers 


per  applicant 

Number  of  applicants  Number  of  interviews 

1 

7 

V 

2 

14 

28 

3 

16 

48 

37 

83 

Rating  Set 

N 

Items  1-11 

Item 

12 

Mean  S^t). 

Heart 

07 

1st 

37 

2.5  .5 

2.9 

.3 

2nd 

30 

2.5  .6 

2.9 

.4 

3rd 

16 

*37* 

2.5  .5 

2.9 

.3 

3.  Performance  rating  (sum  of  10  separate  ratings)  after  6  months  of 
employment:  Mean  of  38,  Standard  Deviation  of  7. 


4. 


Sex:  26  women  and  11  men 
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5.  Selection  tests: 


N 

Mean 

Standard  deviation 

Verbal 

35 

32 

11 

Clerical 

33 

32 

5 

Numerical 

34 

32 

11 

Typing 

22  (all  55 

17 

women) 


It  is  important  to  note  that  the  first  (or  second  or  third)  set  of 
ratings  was  not  necessarily  made  by  the  same  interviewers.  This  will 
cause  some  problems  in  interpreting  the  results.  However,  the  means 
and  standard  deviations  of  items  1-11  were  very  similar  for  all  three 
sets  of  ratings,  as  were  these  same  statistics  for  item  12,  the  hiring 
recommendation . 

The  ratings  on  the  first  11  items  average  midway  between  superior 
and  acceptable, and  two-thirds  of  them  fall  within  this  range.  Ratings 
on  item  12,  the  hiring  recommendation,  average  just  below  the  top 
of  the  superior  category  and  vary  even  less.  Had  rejected  applicants 
been  included  in  the  sample,  the  means  would  ha  *  been  lower  and  the 
variabilities  higher.  However,  6-month  performance  evaluations  would 
not  have  been  available  for  rejects,  and  one  of  our  purposes  is  to 
determine  the  validity  of  the  interview  ratings  for  predicting  performance. 

The  mean  and  standard  deviation  of  the  performance  rating  (actually 
a  sum  of  ratings  on  10  items)  are  very  close  to  those  of  a  sample  of  114 
CNA  support  employees  used  earlier  in  a  validation  of  CNA  selection 
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tests  (Lockman,  1971),  so  the  present  sample  is  not  atypical  insofar 
as  performance  is  concerned. 

The  Verbal  and  Numerical  selection  tests  for  our  sample  are  also 
very  similar  in  mean  and  standard  deviation  tc  those  of  the  sample  of 
CNA  support  employees.  However,  our  applicant  sample  has  a  mean  8  points 
lower  and  a  standard  deviation  10  points  l.ess  than  the  larger  sample. 
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ANALYSES 

Two  statistical  techniques  ware  used  to  evaluate  the  reliability 
of  the  interview  items  across  the  first,  second,  and  third  sets  of  ratings: 
product -moment  correlation  coefficients  were  computed  for  each  pair  of 
ratings  for  each  item,  and  Ebel’s  estimate  of  rating  reliability  was 
computed  for  each  item  (Ebel,  1951).  (The  latter  is  an  intraclass 
correlation  coefficient  calculated  by  an  analysis  of  variance  of  the 
ratings.  In  our  case,  only  two  components  of  the  variance,  attri¬ 
butable  to  applicants  and  error,  were  separated.  The  variance  between 
raters  was  included  in  the  error  term  because  in  practice  decisions 
an'  made  by  comparing  the  ratings  assigned  to  different  applicants 
by  different  interviewers-the  procedure  recommended  by  Ebel  under 
these  circumstances.) 

A  principal  components  analysis  of  the  12  ratings  was  made  to 
see  how  many  different  kinds  of  characteristic*'  or  factors  were  really 
being  measured  (Nunnally,  1967,  chap.  9).  We  expected  that  12  distinct 
kinds  were  not  being  measured,  and  that  many  of  the  items  would  correlate 
highly  with  one  another.  If  this  is  true,  we  can  reduce  the  ratings  to 
fewer  factors  composed  of  similar  but  independent  groups  of  items. 

Finally,  product-moment  correlation  coefficients  were  computed 
between  the  various  ratings,  rating  factors,  sex,  and  the  selection  test 
scores,  on  the  one  hand,  and  the  performance  rating,  on  the  other  hand. 
This  provides  us  with  an  indication  of  the  predictive  validities  of  the 
rating  and  other  selection  variables. 


RESULTS 
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Reliability 

The  table  below  shows  the  mean  correlations  of  the  available  pairs 
of  ratings  for  each  item,  along  with  the  Ebel  reliability  estimate. 


Mean 

Ebel 

Items 

r 

Axx 

1 

.34 

.30 

2 

.27 

.24 

3 

.35 

.31 

4 

.37 

.27 

5 

.28 

.17 

6 

.41 

.26 

7 

.42 

.39 

8 

.23 

.24 

9 

.29 

.29 

10 

.42 

.44 

11 

.27 

.24 

12 

.21 

.21 

Median 

.32 

.28 

The  median  of  the  36  pairs  of  correlations  is  .32  compared  with 
the  median  Ebel  coefficient  of  ,28.  In  fact,  the  rank -order  cor¬ 
relation  between  the  mean  correlations  and  Ebel  coefficients  is  .81. 
The  striking  feature  of  these  statistics  is  their  low  magnitude,  due 
mainly  to  the  small  variation  in  the  ratings  to  begin  with. 


If  we  look  at  the  mean  correlations  between  pairs  of  ratings, 


we  find  a  similar  picture: 


Ratings  of  the 
same  applicant 

Number  of 
applicants 

Mean 

__r 

Range  of  r 

1st  with  2nd 

30 

.26 

.47  to  .05 

1st  with  3rd 

16 

.42 

.63  to  .22 

?r#d  with  3rd 

16 

.28 

.71  to  -.10 

It  appears,  then,  that  rater  agreement  or  between-rater  reliability  is 
no*  very  hir3h,  but  again  the  small  variation  of  the  ratings  is  a  major 


CdUS*  . 

If  we  look  at  the  percentage  of  perfect  agreements  in  ratings  on 

item  12,  the  hiring  recomme'idat  ion ,  we  get  a  different  picture  (chance 

in  this  case  would  be  35  percent): 

1st  with  2nd  ratings  26/30  -  82% 

1st  with  3rd  ratings  14/16  =  82% 

2nd  with  3rd  ratings  13/16  =  81% 


This  kind  of  analysis  was  not  carried  out  on  the  other  11  items  for  two 
reasons:  the  means  and  standard  deviations  of  the  ratings  were  very 
similar,  and  the  factor  analysis  of  the  12  items  shewed  that  the  first 
11  were  measuring  the  same  thing  to  about  the  same  degree.  Their  average 
intercorrelation  was  .62  and  their  loadings  or  correlations  with  the 
general  factor  ranged  from  .70  to  .90,  as  shown  below  (Note  that  the 
hiring  recommendation  does  rot  relate  to  this  factor): 


Item 
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Loading 


1. 

Personal  impression 

.92 

2. 

Self-confidence 

.77 

3. 

Common  sense 

.84 

4. 

Adaptability 

.88 

5. 

Ability  to  communicate 

.83 

6. 

Answers 

.82 

7. 

Experience/training 

.70 

8. 

Reasons  for  changes 

.71 

9. 

Enthusiasm 

.73 

10, 

Relevant  skills 

.77 

11. 

Degree  meets  requirements 

.80 

12. 

Hiring  recommendation 

.23 

Percent  of  trace 

75 

Next  we  looked  at  the  means  and  standard  deviations  of  the  three 
sets  of  summed  ratings  across  items  1-10  and  the  average  of  them.  The 
results  follow: 


Mean 

of  Items  1-10 

S.D.  of 

items  1-10 

Rating  Set 

N 

Mean 

Range 

Mean 

Range 

1 

37 

2. 54 

2.62  to  2.32 

.53 

.61  to  .49 

2 

30 

2.52 

2.62  to  2.41 

.57 

.63  to  .56 

3 

16 

2.51 

2.69  to  2.33 

.51 

.62  to  .48 

Average 

37 

2.52 

2.59  to  2.39 

.43 

.49  to  .40 
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There  is  little  variation  either  within  or  between  the  three  sets  of 
ratings  for  the  first  10  items.  (We  did  not  include  item  11  here, 
since  it  was  rated  on  a  different  scale  than  the  first  10  items;  rather, 
■'tern  11  and  item  12,  the  hiring  recommendation,  will  be  used  separately.) 

Because  the  first  10  items  were  measuring  the  same  thing  and 
because  their  standard  deviations  were  very  similar,  we  summed  them 
for  each  applicant  to  produce  a  conpo_j.ce  in  which  the  items  were 
weighted  as  a  function  of  their  standard  deviation,  about  equally.  We 
called  this  composite  ”Overall  Interviewer  Impression.”  In  theory,  its 
reliability  should  be  higher  than  any  of  its  item  parts.  To  determine 
if  this  was  so,  we  treated  each  set  of  ratings  as  item  samples  whose 
reliability  will  depend  entirely  on  the  average  correlation  among  the 
items  and  the  number  of  items  (Nunnally  1967,  p.  19*).*  The  pertinent 
results  are  shown  below: 


Rating  set 

N 

r 

Internal 

reliability 

1 

37 

.61 

.94 

2 

30 

.62 

.94 

3 

16 

.52 

.94 

From  these  results,  we  concltde  that  the  composite  or  sum  of  ratings 
on  interview  items  1  through  10  has  a  high  degree  of  internal  reliabi  ity. 
This  does  not  imply  that  rhe  composite  ratings  necessarily  agree  high  y 
with  one  another,  only  that  they  are  internally  consistent  or  homogeneous- 
another  way  of  saying  they  measure  the  same  thing. 


**  1  f  (k-l)r 

where  1  is  the  number  of  items  and  7  is  their  mean  inter* orrelation. 
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We  eliminated  the  third  set  of  ratings  because  they  were  based 
on  only  16  applicants.  The  correlation  of  the  composite  between  sets 
one  and  two  is  .37  -  indicating  that  the  overall  agreement  (or  between - 
rate  reliability)  is  still  rather  low,  albeit  higher  than  the  average 
of  the  individual  items.  Although  the  sets  of  ratings  are  highly  con¬ 
sistent,  reliability  between  sets  of  ratings  is  moderate  at  best.  This 
may  in  part  be  due  to  the  fact  that  more  than  one  rater  was  involved 
in  each  set  of  ratings.  Because  of  unreliability,  we  can  expect  dif¬ 
ferent  correlations  among  the  rating  composite,  selection  tests,  recom¬ 
mendations,  and  performance  ratings  for  the  two  sets  of  ratings.  Further, 
the  obtained  correlations  will  be  limited  in  size  by  the  low  reliability. 

Validity 

Now,  we  turn  to  the  question  of  the  validity  of  the  composite 
ratings,  selection  test  scores,  and  sex  for  predicting  performance 
ratings  six  months  after  the  applicants  were  hired.  The  table  below 
contains  the  intercorrelations  of  these  variables  for  the  first  and 
second  sets  of  ratings  and  their  average.  (The  correlations  among  sex, 
test  scores,  and  performance  are  shown  under  the  dashed  line  in  the  table. 
They  are  not  affected  by  the  ratings  or  by  differences  among  raters.) 
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1st  Rating  Set 

Reg.  Rec. 

Composite  rating  (items  1-10)  .53  .21 

Requirement  (item  11)  .12 

Recommendation  (item  12) 

2nd  Rating  Set 

Composite  rating 
Requirement 
Recommendation 

Average  of  1st  and  2nd  Rating  Sets 

Reg.  Rec. 

Composite  rating  .68  .10 

Requirement  . 14 

Recommendation 


Sex 

Verbal  Test 
Clerical  Test 
Numerical  Test- 
Typing  Test 

*bnly  women  took  the  typing  test 


r 


Correlations  (N  -  37) 


Sex 

V 

C 

N 

T 

Perf. 

r 

.62 

cn 

CM 

• 

.43 

.18 

.28 

.20 

.20 

.35 

.34 

.04 

.19 

1 

• 

o 

00 

.10 

-.19 

-.06 

.05 

1 

• 

o 

U-i 

.12 

.02 

Correlations 

(N  = 

30) 

Sex 

V 

C 

N 

T 

Perf . 

r 

.51 

.29 

.31 

.44 

.38 

.30 

.10 

.50 

.50 

.25 

.45 

.14 

.25 

-.01 

-.08 

.28 

.11 

.05 

.31 

Mean  Correlations 

Sex 

V 

C 

N 

T 

Perf. 

.56 

.28 

.37 

.31 

.33 

.24 

.43 

.42 

.15 

.33 

.03 

.18 

-.10 

-.07 

.17 

.03 

.09 

.17 

.58 

.51 

.29 

* 

.28 

.21 

.22  - 

.15 

.22 

.30 

.26 

.0s 

.16 

.12 

.00 
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We  expected  differences  in  the  correlations  among  the  interview 
ratings  variables  for  sets  one  and  two,  but  there  are  similarities 
beyond  those  that  could  be  expected  by  chance.  Looking  at  the  averaged 
correlations,  we  find  the  following: 

1.  Composite  rating  correlates  .68  wi*-g  the  rating  on  the 
degree  to  v’ < \ -h  the  applicant  meets  the  requirements  of 
the  job  for  which  he  or  she  is  applying.  This  is  logical 
if  we  view  the  requirements  rating  as  an  overall  assessment 
of  the  applicant’s  suitability  that  should  relate  to  the 
ratings  of  background  characteristics  and  behavior  in 

the  interview.  We  have  also  shown  that  item  11  loads  highly 
on  the  general  factor  measured  also  by  items  1  through  10. 

2.  Composite  rating  correlates  .36  with  sex,  indicating  that 
women  are  rated  higher  than  men- -  perhaps  a  defensible  bias 
if  the  job  openings  are  viewed  by  interviewers  as  more  v. i~- 
able  for  women  than  men.  The  requirement  rating  also  cor¬ 
relates  with  sex,  .43. 

3.  Sex  correlates  .42  with  scores  on  the  Verbal  aptitude  test. 
Past  research  has  shown  that  women  score  higher  than  men 
on  this  test. 

4.  Sex  correlates  .28  with  performance  rating  (p  *  .10),  a  small 
relationohip  that  has  been  found  before. 

5.  Composite  rating  correlates  .24  with  performance  rating 
(compound  p  -  .10).  This  is  a  small  relationship  or 
validity,  but  the  sample  of  applicants  is  not  large,  the 
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range  of  the  interview  ratings  is  limited,  and  the  between- 
rater  reliability  is  low,  so  it  is  perhaps  about  all  we 
could  expect. 

It  is  interesting  to  note  that  of  all  the  variables,  only 
sex  and  composite  rating  correlate  significantly  with  per¬ 
formance  in  this  sample,  and  that  they  also  correlate  highly 
with  one  another. 

6.  The  hiring  recommendation  item  does  not  correlate  with  any 
of  the  other  variables,  including  performance.  It  is  the 
least  reliable  of  all  12  ratings,  with  both  an  average  cor¬ 
relation  and  Ebel  reliability  of  .21.  It  is  also  limited  in 
range,  since  no  applicant  was  rejected. 
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CONCLUSIONS 

1.  The  12  items  on  the  Interview  Evaluation  Form  for  CNA  support 
personnel  have  low  individual  reliabilities  (averaging  around 
.30). 

2.  The  first  10  of  these  items,  ratings  of  the  applicant's 
background  and  interview  behavior,  all  measure  the  same  thing 

or  factor  to  about  the  same  degree.  This  general  factor  was  called 
"Overall  Interviewer  Impression"'  and  can  be  obtained  simply 
by  summing  the  ratings  for  these  items. 

3.  The  sum  of  these  10  ratings  has  a  very  high  degree  of  internal 
consistency  (.94),  but  a  very  modest  degree  of  reliability 
between  sets  of  ratings  (.37)  -  albeit  somewhat  higher  than 
that  of  the  average  item  ratings. 

d.  The  sum  of  the  10  ratings  correlates  fairly  highly  with  sox 
(.56)  in  favor  of  women. 

5.  Both  the  sum  of  the  10  ratings  and  sex  have  significant 
correlations  with  performance  ratings  after  six  months  on 
the  job,  but  the  practical  value  of  these  relationships  is 


low. 
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